Skip to content

Conversation

@maxisbey
Copy link
Contributor

This PR extends the port allocation race condition fix to all test files affected by the same TOCTOU issue, eliminating intermittent test failures when running with pytest-xdist parallel workers.

Motivation and Context

When running tests with pytest-xdist parallel workers, multiple workers would receive the same "free" port from socket.bind(("127.0.0.1", 0)), causing a TOCTOU (Time-of-Check-Time-of-Use) race condition. This manifested as:

  • "address already in use" errors when multiple workers tried to bind the same port
  • Tests connecting to wrong servers (e.g., expecting one server but getting another)
  • Flaky tests in CI that would pass locally but fail intermittently in parallel runs

This builds on the initial fix in tests/server/fastmcp/test_integration.py by applying the same solution to 7 additional test files that had the identical issue.

How Has This Been Tested?

  • ✅ Ran full test suite with 14 parallel workers: PYTEST_DISABLE_PLUGIN_AUTOLOAD="" uv run --frozen pytest tests/ -v
  • ✅ All 703 tests pass consistently
  • ✅ Verified no "address already in use" errors occur
  • ✅ Confirmed no test cross-contamination between workers
  • ✅ Tested multiple consecutive runs to ensure stability

Each pytest-xdist worker now receives an exclusive port range from 40000-59999 based on its worker ID, eliminating port conflicts entirely.

Breaking Changes

None. This is an internal test infrastructure change with no impact on the public API or user-facing functionality.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Files modified:

  • tests/server/test_streamable_http_security.py - Updated server_port fixture
  • tests/server/test_sse_security.py - Updated server_port fixture
  • tests/shared/test_streamable_http.py - Updated 3 port fixtures (basic, json, event server)
  • tests/shared/test_sse.py - Updated server_port fixture
  • tests/shared/test_ws.py - Updated server_port fixture
  • tests/client/test_http_unicode.py - Updated unicode_server_port fixture
  • tests/client/test_notification_response.py - Updated non_sdk_server_port fixture
  • tests/test_test_helpers.py - Fixed socket cleanup to prevent ResourceWarning

Implementation details:
All affected fixtures now use get_worker_specific_port(worker_id) instead of the race-prone socket.bind(("127.0.0.1", 0)) pattern. The helper function allocates worker-specific port ranges ensuring no overlap between parallel test workers.

Fixes a TOCTOU (Time-of-Check-Time-of-Use) race condition where multiple
pytest-xdist workers would attempt to bind to the same port, causing tests
to fail with "address already in use" errors or connect to the wrong server.

## Root Cause
The server_port() fixture would:
1. Bind to port 0 to get a free port
2. Immediately close the socket, freeing the port
3. Return the port number

When tests ran in parallel with pytest-xdist, multiple workers could get
the same "free" port between steps 2 and 3, leading to conflicts.

## Solution
Implement worker-specific port ranges using pytest-xdist's worker_id:
- Each worker gets a dedicated range from 40000-49999
- Port ranges are calculated based on PYTEST_XDIST_WORKER_COUNT
- Guarantees no overlap between workers

## Changes
- Add parse_worker_index() to extract worker number from worker_id
- Add calculate_port_range() to compute non-overlapping port ranges
- Refactor get_worker_specific_port() to use the pure functions above
- Update server_port fixture to use worker-specific ports
- Add comprehensive unit tests (28 tests) covering all edge cases

Tests now pass reliably with 14 workers in parallel.
Extended the fix from test_integration.py to all test files that suffered
from the same TOCTOU (Time-of-Check-Time-of-Use) race condition. When
running with pytest-xdist parallel workers, multiple workers would get
the same "free" port from `socket.bind(("127.0.0.1", 0))`, leading to:
- "address already in use" errors
- Tests connecting to wrong servers

Solution:
- Updated 7 test files to use `get_worker_specific_port(worker_id)`:
  * tests/server/test_streamable_http_security.py
  * tests/server/test_sse_security.py
  * tests/shared/test_streamable_http.py (3 fixtures)
  * tests/shared/test_sse.py
  * tests/shared/test_ws.py
  * tests/client/test_http_unicode.py
  * tests/client/test_notification_response.py
- Fixed socket cleanup issue in tests/test_test_helpers.py

Each worker now gets an exclusive port range from 40000-59999, eliminating
the race condition across all test files that start servers.

Testing:
- All 703 tests pass with pytest-xdist parallel execution
- Port allocation now consistent across all parallel workers
@maxisbey maxisbey added bug Something isn't working P2 Moderate issues affecting some users, edge cases, potentially valuable feature labels Oct 28, 2025
@maxisbey maxisbey marked this pull request as draft October 28, 2025 23:39
@maxisbey
Copy link
Contributor Author

maxisbey commented Oct 28, 2025

actually, I don't like this approach anymore.

It's a bit late at night so I'm going to leave it as draft for now, but some other options I was thinking are:

  • Adding more port randomisation - although this only just reduces the likelihood of port collisions.
  • Using a sub process with a mutex or shared memory or something, have that process hand out port numbers - although this means we have a limited number of ports, so we'd need a way to release them over time
  • Using a FileLock to allocate ports per test

Although none of them really are clean IMO. So going to leave this for now

@felixweinberger felixweinberger added the pending publish Draft PRs need to be published for team to review label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working P2 Moderate issues affecting some users, edge cases, potentially valuable feature pending publish Draft PRs need to be published for team to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants